Sketching Word Vectors Through Hashing
نویسندگان
چکیده
We propose a new fast word embedding technique using hash functions. The method is a derandomization of a new type of random projections: By disregarding the classic constraint used in designing random projections (i.e., preserving pairwise distances in a particular normed space), our solution exploits extremely sparse non-negative random projections. Our experiments show that the proposed method can achieve competitive results, comparable to neural embedding learning techniques, however, with only a fraction of the computational complexity of these methods. While the proposed derandomization enhances the computational and space complexity of our method, the possibility of applying weighting methods such as positive pointwise mutual information (PPMI) to our models after their construction (and at a reduced dimensionality) imparts a high discriminatory power to the resulting embeddings. Obviously, this method comes with other known benefits of random projection-based techniques such as ease of update.
منابع مشابه
2 . 3 Sketching using Locality Sensitive Hashing
In this lecture we will get to know several techniques that can be grouped by the general definition of sketching. When using the sketching technique each element is replaced by a more compact representation of itself. An alternative algorithm is run on the more compact representations. Finally, one has to show that this algorithm gives the same result as the original algorithm with high probab...
متن کاملFROSH: FasteR Online Sketching Hashing
Many hashing methods, especially those that are in the data-dependent category with good learning accuracy, are still inefficient when dealing with three critical problems in modern data analysis. First, data usually come in a streaming fashion, but most of the existing hashing methods are batch-based models. Second, when data become huge, the extensive computational time, large space requireme...
متن کاملHash2Vec, Feature Hashing for Word Embeddings
In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm...
متن کاملHash Embeddings for Efficient Word Representations
We present hash embeddings, an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by k d-dimensional embeddings vectors and one k dimensional weight vector. The final d dim...
متن کاملPredicting IPO Performance from Nearest Neighbors Using TF-IDF Weighted Word Count Vectors
We introduce a novel approach to mining and leveraging data concerning stocks in order to predict the performance of new stocks following their initial public offering, a traditionally difficult task due to the lack of information and historical performance data. We collect a large corpus of articles for every existing stock between March 1st, 2014 and March 1st, 2015. We create weighted featur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.04253 شماره
صفحات -
تاریخ انتشار 2017